Variable Selection in Nonparametric Additive Models.
نویسندگان
چکیده
We consider a nonparametric additive model of a conditional mean function in which the number of variables and additive components may be larger than the sample size but the number of nonzero additive components is "small" relative to the sample size. The statistical problem is to determine which additive components are nonzero. The additive components are approximated by truncated series expansions with B-spline bases. With this approximation, the problem of component selection becomes that of selecting the groups of coefficients in the expansion. We apply the adaptive group Lasso to select nonzero components, using the group Lasso to obtain an initial estimator and reduce the dimension of the problem. We give conditions under which the group Lasso selects a model whose number of components is comparable with the underlying model, and the adaptive group Lasso selects the nonzero components correctly with probability approaching one as the sample size increases and achieves the optimal rate of convergence. The results of Monte Carlo experiments show that the adaptive group Lasso procedure works well with samples of moderate size. A data example is used to illustrate the application of the proposed method.
منابع مشابه
Variable Selection in Nonparametric and Semiparametric Regression Models
This chapter reviews the literature on variable selection in nonparametric and semiparametric regression models via shrinkage. We highlight recent developments on simultaneous variable selection and estimation through the methods of least absolute shrinkage and selection operator (Lasso), smoothly clipped absolute deviation (SCAD) or their variants, but restrict our attention to nonparametric a...
متن کاملEstimation and Variable Selection for Semiparametric Additive Partial Linear Models (SS-09-140).
Semiparametric additive partial linear models, containing both linear and nonlinear additive components, are more flexible compared to linear models, and they are more efficient compared to general nonparametric regression models because they reduce the problem known as "curse of dimensionality". In this paper, we propose a new estimation approach for these models, in which we use polynomial sp...
متن کاملNonparametric Greedy Algorithms for the Sparse Learning Problem
This paper studies the forward greedy strategy in sparse nonparametric regression. For additive models, we propose an algorithm called additive forward regression; for general multivariate models, we propose an algorithm called generalized forward regression. Both algorithms simultaneously conduct estimation and variable selection in nonparametric settings for the high dimensional sparse learni...
متن کاملVariable Selection in High-dimensional Additive Models Based on Norms of Projections
Abstract. We consider the problem of variable selection in highdimensional sparse additive models. We focus on the case that the components belong to nonparametric classes of functions. The proposed method is motivated by geometric considerations in Hilbert spaces and consists of comparing the norms of the projections of the data onto various additive subspaces. Under minimal geometric assumpti...
متن کاملDiscussion of “Random Rates in Anisotropic Regression” by Hoffmann and Lepski
We congratulate the authors for a stimulating paper (referred to as HL in the following). As the authors correctly stated, the number of variables does not affect the optimal rate of convergence in a regular parametric model, but it does affect the optimal rate of convergence in nonparametric models. To be more precise, the optimal rate of convergence in a nonparametric function estimation prob...
متن کاملVariable Selection in Additive Models by Nonnegative Garrote
We adapt Breiman’s (1995) nonnegative garrote method to perform variable selection in nonparametric additive models. The technique avoids methods of testing for which no reliable distributional theory is available. In addition it removes the need for a full search of all possible models, something which is computationally intensive, especially when the number of variables is moderate to high. T...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Annals of statistics
دوره 38 4 شماره
صفحات -
تاریخ انتشار 2010